Skip to content

Implement codecs.*replace_errors#712

Merged
slozier merged 1 commit intoIronLanguages:masterfrom
BCSharp:replace_errors
Dec 22, 2019
Merged

Implement codecs.*replace_errors#712
slozier merged 1 commit intoIronLanguages:masterfrom
BCSharp:replace_errors

Conversation

@BCSharp
Copy link
Copy Markdown
Member

@BCSharp BCSharp commented Dec 21, 2019

This PR implements replace_errors, backslashreplace_errors and xmlcharrefreplace_errors from codecs. In the positive cases (i.e. valid UnicodeError instances), the functions behave like their CPython counterparts.

Negative cases are more interesting. CPython seems to coerce invalid range values:

>>> import codecs as cd
>>> cd.backslashreplace_errors(UnicodeEncodeError("X", "abcd", 2, 6, "-"))
('\\x63\\x64', 4)
>>> cd.backslashreplace_errors(UnicodeEncodeError("X", "abcd", -2, 2, "-"))
('\\x61\\x62', 2)
>>> cd.backslashreplace_errors(UnicodeEncodeError("X", "abcd", -2, 6, "-"))
('\\x61\\x62\\x63\\x64', 4)

However, it does not do it consistently:

>>> cd.backslashreplace_errors(UnicodeEncodeError("X", "abcd", 4, 4, "-"))
('\\x64', 4)
>>> cd.backslashreplace_errors(UnicodeEncodeError("X", "abcd", 3, 3, "-"))
('', 3)
>>> cd.backslashreplace_errors(UnicodeEncodeError("X", "abcd", 2, 2, "-"))
('', 2)
>>> cd.backslashreplace_errors(UnicodeEncodeError("X", "abcd", 1, 1, "-"))
('', 1)
>>> cd.backslashreplace_errors(UnicodeEncodeError("X", "abcd", 0, 0, "-"))
('\\x61', 1)
>>> cd.backslashreplace_errors(UnicodeEncodeError("X", "abcd", -1, -1, "-"))
('\\x61', 1)

Of course, improperly constructed UnicodeEncodeError should not be around in the first place and CPython seems to adapt the "garbage-in, garbage out" approach.
What is important, I believe, is that in no circumstances IndexError is thrown. This is what I've implemented in IronPython, whith a little bit more consistent coercing of the start:end range.

Copy link
Copy Markdown
Contributor

@slozier slozier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@slozier slozier merged commit 5930166 into IronLanguages:master Dec 22, 2019
@BCSharp BCSharp deleted the replace_errors branch December 22, 2019 21:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants